test(integration): expand numeric ORE ordering tests with edge cases#377
test(integration): expand numeric ORE ordering tests with edge cases#377
Conversation
Replace duplicated numeric order tests with a generic helper that inserts values in interleaved (zigzag) order and verifies ORDER BY. Increases per-test coverage from 3 to 10 values including negatives, zero, and boundary values. Adds missing int4 DESC, int8 DESC, and float8 ASC/DESC tests (16 → 20 total tests).
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds reusable ORE (order-revealing encryption) ORDER BY test helpers, a generic type‑parametrized ordering test, and a multitenant test module that runs those ordering scenarios across tenant keysets; also adds an interleaved index helper and wires the new modules into the test crate. (33 words) Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| /// `values` must be provided in ascending sorted order. | ||
| /// Values are inserted in interleaved (non-sorted) order, then verified | ||
| /// via ORDER BY in the given direction. | ||
| async fn map_ore_order_generic<T>(col_name: &str, values: Vec<T>, direction: &str) |
There was a problem hiding this comment.
Do these tests run with the same key every time? My guess is that they would be.
To test ORE behaviour definitively, the tests should run with many different keys.
There was a problem hiding this comment.
When an ORE value is encrypted, it uses a key.
2 encrypted values will compare correctly if they were encrypted with the same key.
If they were encrypted with different keys then they will compare correctly 50% of the time.
I'm suggesting that the tests need to encrypt a bunch of values with key a, check that they compare correctly then do it again with a different key, b and make sure they still compare correctly.
One way to do this would be to randomly generate the key used for every test run so that we build confidence over many CI runs. Another way would be to use property testing to do many runs in a single CI run.
There was a problem hiding this comment.
@coderdan the Proxy tests run using a Workspace that survives across test runs - which means they would be using the same Keyset across test runs.
Given that a broken ORE integration with Proxy could false positive "pass" an assertion of a comparison op with 50/50 odds, and the ORE is deterministic with respect to the keyset material I can see that it would in principal be possible to be convinced something is working when it isn't.
But even though the key is unchanging between test runs we are encrypting multiple different values in this test and ensuring they sort correctly. I would have thought that if ORE integration was broken then the odds of a false positive would halve for every seperate value in the values being sorted?
There was a problem hiding this comment.
One way to do this in the proxy context would be to run the test for multiple different keysets.
There was a problem hiding this comment.
@coderdan @freshtonic
Added ore ordering to the multitenant test context:
packages/cipherstash-proxy-integration/src/multitenant/ore_order.rs
Runs the tests against 3 tenant keysets.
Original ORE tests remain, and they test the default keyset.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
packages/cipherstash-proxy-integration/src/multitenant/ore_order.rs (3)
30-48: Duplicated utility function.
interleaved_indicesis duplicated verbatim frommap_ore_index_order.rs. Consider extracting it to thecrate::commonmodule to avoid duplication.♻️ Extract to common module
In
crate::common:/// Returns indices in zigzag order so insertion is never accidentally sorted. /// For len=5: [4, 0, 3, 1, 2] pub fn interleaved_indices(len: usize) -> Vec<usize> { // ... implementation }Then import in both test files:
use crate::common::interleaved_indices;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/cipherstash-proxy-integration/src/multitenant/ore_order.rs` around lines 30 - 48, The function interleaved_indices is duplicated; extract it into a shared module (e.g., crate::common) as a public function pub fn interleaved_indices(len: usize) -> Vec<usize> with the same implementation, then replace the local definitions in multitenant::ore_order.rs and map_ore_index_order.rs by importing use crate::common::interleaved_indices; and remove the duplicate implementations so both files call the single shared function.
16-22: SQL injection risk inconnect_as_tenantwith string interpolation.The
keyset_idis interpolated directly into the SQL string. While the value comes from environment variables (which should be UUIDs), the proxy validates the UUID format server-side (permaybe_set_keyset_idin context snippet 1). However, using parameterized queries would be more defensive.That said, since this is test code and the values are from controlled environment variables, the risk is minimal.
♻️ Optional: Use EXECUTE for parameterized SET
If you want to be extra defensive, you could validate the UUID format before interpolation:
async fn connect_as_tenant(keyset_id: &str) -> tokio_postgres::Client { let client = connect_with_tls(PROXY).await; + // Validate UUID format before interpolation + uuid::Uuid::parse_str(keyset_id) + .expect("keyset_id must be a valid UUID"); let sql = format!("SET CIPHERSTASH.KEYSET_ID = '{keyset_id}'"); client.execute(&sql, &[]).await.unwrap(); client }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/cipherstash-proxy-integration/src/multitenant/ore_order.rs` around lines 16 - 22, The connect_as_tenant function currently interpolates keyset_id into the SQL string which can lead to injection; instead, validate the keyset_id as a UUID (e.g., use uuid::Uuid::parse_str) and/or issue the SET using a parameterized execution via tokio_postgres (reference connect_as_tenant, connect_with_tls, PROXY and the "SET CIPHERSTASH.KEYSET_ID" statement) so the value is either rejected if not a valid UUID or passed as a query parameter rather than concatenated into the SQL.
50-436: Test helpers are largely duplicated frommap_ore_index_order.rs.The helper functions (
ore_order_text,ore_order_generic, etc.) duplicate logic frommap_ore_index_order.rs. The key difference is that these accept a&Clientparameter while the original tests create their own client.This duplication is understandable given the different test structures, but if maintainability becomes a concern, you could refactor
map_ore_index_order.rsto also use client-accepting helpers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/cipherstash-proxy-integration/src/multitenant/ore_order.rs` around lines 50 - 436, Several helper tests (ore_order_text, ore_order_text_desc, ore_order_nulls_last_by_default, ore_order_nulls_first, ore_order_qualified_column, ore_order_qualified_column_with_alias, ore_order_no_eql_column_in_select_projection, ore_order_plaintext_column, ore_order_plaintext_and_eql_columns, ore_order_simple_protocol, ore_order_generic) are duplicated from map_ore_index_order.rs; extract the shared implementations into a single test helper module that accepts &tokio_postgres::Client (e.g., tests::ore_order_helpers) and move the bodies of these functions there, then replace the duplicates in both files with calls to the new helpers (keep the same function names/signatures so call sites only need imports), update imports/visibility, and remove the duplicated definitions to avoid code drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/cipherstash-proxy-integration/src/multitenant/ore_order.rs`:
- Line 326: Rename the misspelled variable s_enctrypted_one to s_encrypted_one
and update all usages accordingly (the declaration currently named
s_enctrypted_one and its references later in the file, e.g., where it is used
alongside s_encrypted_two and s_encrypted_three) so the name matches the correct
spelling and the other encrypted variables; ensure any pattern matches or tests
referencing s_enctrypted_one are updated to s_encrypted_one as well.
---
Nitpick comments:
In `@packages/cipherstash-proxy-integration/src/multitenant/ore_order.rs`:
- Around line 30-48: The function interleaved_indices is duplicated; extract it
into a shared module (e.g., crate::common) as a public function pub fn
interleaved_indices(len: usize) -> Vec<usize> with the same implementation, then
replace the local definitions in multitenant::ore_order.rs and
map_ore_index_order.rs by importing use crate::common::interleaved_indices; and
remove the duplicate implementations so both files call the single shared
function.
- Around line 16-22: The connect_as_tenant function currently interpolates
keyset_id into the SQL string which can lead to injection; instead, validate the
keyset_id as a UUID (e.g., use uuid::Uuid::parse_str) and/or issue the SET using
a parameterized execution via tokio_postgres (reference connect_as_tenant,
connect_with_tls, PROXY and the "SET CIPHERSTASH.KEYSET_ID" statement) so the
value is either rejected if not a valid UUID or passed as a query parameter
rather than concatenated into the SQL.
- Around line 50-436: Several helper tests (ore_order_text, ore_order_text_desc,
ore_order_nulls_last_by_default, ore_order_nulls_first,
ore_order_qualified_column, ore_order_qualified_column_with_alias,
ore_order_no_eql_column_in_select_projection, ore_order_plaintext_column,
ore_order_plaintext_and_eql_columns, ore_order_simple_protocol,
ore_order_generic) are duplicated from map_ore_index_order.rs; extract the
shared implementations into a single test helper module that accepts
&tokio_postgres::Client (e.g., tests::ore_order_helpers) and move the bodies of
these functions there, then replace the duplicates in both files with calls to
the new helpers (keep the same function names/signatures so call sites only need
imports), update imports/visibility, and remove the duplicated definitions to
avoid code drift.
ℹ️ Review info
Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c59d4d9a-07a5-4fd2-a501-9399190755dc
📒 Files selected for processing (3)
packages/cipherstash-proxy-integration/src/map_ore_index_order.rspackages/cipherstash-proxy-integration/src/multitenant/mod.rspackages/cipherstash-proxy-integration/src/multitenant/ore_order.rs
packages/cipherstash-proxy-integration/src/multitenant/ore_order.rs
Outdated
Show resolved
Hide resolved
…keysets Generate 54 tests (18 per tenant × 3 keysets) covering text ASC/DESC, NULLs, qualified columns, aliases, projection, plaintext, mixed columns, simple protocol, and numeric types (int2/int4/int8/float8 ASC/DESC). Uses a macro to generate per-tenant submodules without extra dependencies.
fa1fb37 to
bdd5d63
Compare
Extract shared test logic from map_ore_index_order.rs and multitenant/ore_order.rs into ore_order_helpers.rs to eliminate code duplication. Move interleaved_indices into crate::common. Add UUID validation to connect_as_tenant to prevent SQL injection.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/cipherstash-proxy-integration/src/ore_order_helpers.rs (1)
368-394: Make sort direction type-safe inore_order_generic.Using a free-form
&strmeans only exact"DESC"reversesexpected; other variants silently behave like ASC in assertions. A small enum avoids this class of mistakes.♻️ Proposed refactor
+pub enum SortDirection { + Asc, + Desc, +} + +impl SortDirection { + fn as_sql(&self) -> &'static str { + match self { + SortDirection::Asc => "ASC", + SortDirection::Desc => "DESC", + } + } +} + pub async fn ore_order_generic<T>( client: &tokio_postgres::Client, col_name: &str, values: Vec<T>, - direction: &str, + direction: SortDirection, ) where for<'a> T: Clone + PartialEq + ToSql + Sync + FromSql<'a> + PartialOrd + Debug, { @@ - let select_sql = format!("SELECT {col_name} FROM encrypted ORDER BY {col_name} {direction}"); + let select_sql = format!( + "SELECT {col_name} FROM encrypted ORDER BY {col_name} {}", + direction.as_sql() + ); @@ - let expected: Vec<T> = if direction == "DESC" { - values.into_iter().rev().collect() - } else { - values - }; + let expected: Vec<T> = match direction { + SortDirection::Desc => values.into_iter().rev().collect(), + SortDirection::Asc => values, + };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/cipherstash-proxy-integration/src/ore_order_helpers.rs` around lines 368 - 394, Change the loose &str "direction" in ore_order_generic to a small enum (e.g., enum SortDirection { Asc, Desc }) and update the function signature and call sites to accept that enum; use a match on SortDirection to build the ORDER BY clause string for select_sql and to decide whether to reverse expected (match SortDirection::Desc => values.into_iter().rev().collect(), SortDirection::Asc => values). Update any places constructing the SQL string (the select_sql format) to convert the enum to the correct SQL token and keep the rest of ore_order_generic (inserts, query, row extraction) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/cipherstash-proxy-integration/src/multitenant/ore_order.rs`:
- Around line 37-217: Tests truncate shared tables without serialization,
causing race conditions when run in parallel; annotate each async test function
(e.g., multitenant_ore_order_text, multitenant_ore_order_int4_desc,
multitenant_ore_order_float8, etc.) with the serial_test attribute
(#[serial_test::serial]) or otherwise enforce single-threaded execution so
truncates and inserts cannot interleave; add the attribute above every
#[tokio::test] in this file (or alternatively ensure CI sets RUST_TEST_THREADS=1
or implement a DB lock helper used by these functions) to serialize access to
the shared `encrypted`/`plaintext` tables.
---
Nitpick comments:
In `@packages/cipherstash-proxy-integration/src/ore_order_helpers.rs`:
- Around line 368-394: Change the loose &str "direction" in ore_order_generic to
a small enum (e.g., enum SortDirection { Asc, Desc }) and update the function
signature and call sites to accept that enum; use a match on SortDirection to
build the ORDER BY clause string for select_sql and to decide whether to reverse
expected (match SortDirection::Desc => values.into_iter().rev().collect(),
SortDirection::Asc => values). Update any places constructing the SQL string
(the select_sql format) to convert the enum to the correct SQL token and keep
the rest of ore_order_generic (inserts, query, row extraction) unchanged.
ℹ️ Review info
Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f5a81e9d-4d8d-4f07-98e3-eae1afacf6cb
📒 Files selected for processing (5)
packages/cipherstash-proxy-integration/src/common.rspackages/cipherstash-proxy-integration/src/lib.rspackages/cipherstash-proxy-integration/src/map_ore_index_order.rspackages/cipherstash-proxy-integration/src/multitenant/ore_order.rspackages/cipherstash-proxy-integration/src/ore_order_helpers.rs
…enum Add serial_test dependency and #[serial] attribute to all ORE ordering tests to prevent race conditions from concurrent clear() calls on shared tables. Replace direction string literals with a type-safe SortDirection enum in ore_order_generic.
There was a problem hiding this comment.
🧹 Nitpick comments (3)
packages/cipherstash-proxy-integration/src/map_ore_index_order.rs (1)
104-220: Consider centralizing shared numeric fixtures.The same test value vectors are repeated across this file and
packages/cipherstash-proxy-integration/src/multitenant/ore_order.rs. Extracting shared constants reduces drift and future update misses.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/cipherstash-proxy-integration/src/map_ore_index_order.rs` around lines 104 - 220, Tests repeat identical numeric vectors across multiple test functions (e.g., map_ore_order_int2, map_ore_order_int4, map_ore_order_int8, map_ore_order_float8 and their _desc variants) and across another test file; extract these fixtures into shared constants in a single place (for example add pub const/ static arrays in ore_order_helpers.rs or a new test_fixtures module) and replace the inline Vec literals with references to those constants (e.g., use ENCRYPTED_INT2_VALUES, ENCRYPTED_INT4_VALUES, ENCRYPTED_INT8_VALUES, ENCRYPTED_FLOAT8_VALUES) so both this file and the multitenant/ore_order.rs can import and reuse them.packages/cipherstash-proxy-integration/src/ore_order_helpers.rs (1)
392-403: Constrain dynamic column SQL to an allowlist type.Line 392 and Line 402 interpolate
col_namedirectly into SQL. It is currently safe at callsites, but this helper ispuband easy to misuse later. Prefer a typed column enum with fixed SQL identifiers.Refactor sketch
+pub enum OreOrderColumn { + EncryptedInt2, + EncryptedInt4, + EncryptedInt8, + EncryptedFloat8, +} + +impl OreOrderColumn { + fn as_sql_ident(&self) -> &'static str { + match self { + OreOrderColumn::EncryptedInt2 => "encrypted_int2", + OreOrderColumn::EncryptedInt4 => "encrypted_int4", + OreOrderColumn::EncryptedInt8 => "encrypted_int8", + OreOrderColumn::EncryptedFloat8 => "encrypted_float8", + } + } +} + pub async fn ore_order_generic<T>( client: &tokio_postgres::Client, - col_name: &str, + col: OreOrderColumn, values: Vec<T>, direction: SortDirection, ) where for<'a> T: Clone + PartialEq + ToSql + Sync + FromSql<'a> + PartialOrd + Debug, { + let col_name = col.as_sql_ident(); let insert_sql = format!("INSERT INTO encrypted (id, {col_name}) VALUES ($1, $2)"); ... }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/cipherstash-proxy-integration/src/ore_order_helpers.rs` around lines 392 - 403, The code interpolates col_name directly into SQL (used in insert_sql and select_sql) which is unsafe for a public helper; replace the dynamic string with a typed Column enum (e.g., pub enum Column { Foo, Bar, ... }) and give it a method sql_name(&self) -> &'static str that returns the fixed identifier for each variant; change this helper's signature to accept Column instead of a string, call col.sql_name() when building insert_sql and select_sql, and update all callsites (and any tests) to pass the enum; keep direction.as_sql() as-is and do not interpolate any other unchecked identifiers.packages/cipherstash-proxy-integration/src/multitenant/ore_order.rs (1)
21-22: Prefer parameterized session config over formatted SQL.Line 21 and Line 22 build a SQL string for
SET. Even with UUID validation, usingset_config(..., $1, ...)avoids manual SQL construction and keeps this helper safer to evolve.Refactor sketch
- let sql = format!("SET CIPHERSTASH.KEYSET_ID = '{keyset_id}'"); - client.execute(&sql, &[]).await.unwrap(); + client + .execute( + "SELECT set_config('cipherstash.keyset_id', $1, false)", + &[&keyset_id], + ) + .await + .unwrap();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/cipherstash-proxy-integration/src/multitenant/ore_order.rs` around lines 21 - 22, Replace the formatted SET statement that builds sql with a parameterized call to set_config to avoid manual SQL construction: instead of creating sql via format!("SET CIPHERSTASH.KEYSET_ID = '{keyset_id}'") and calling client.execute(&sql, &[]).await.unwrap(), call client.execute with a parameterized query that invokes set_config('CIPHERSTASH.KEYSET_ID', $1, false) (or equivalent) and pass keyset_id as the parameter; keep references to the existing keyset_id variable and the client.execute invocation so the change is localized to that helper.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@packages/cipherstash-proxy-integration/src/map_ore_index_order.rs`:
- Around line 104-220: Tests repeat identical numeric vectors across multiple
test functions (e.g., map_ore_order_int2, map_ore_order_int4,
map_ore_order_int8, map_ore_order_float8 and their _desc variants) and across
another test file; extract these fixtures into shared constants in a single
place (for example add pub const/ static arrays in ore_order_helpers.rs or a new
test_fixtures module) and replace the inline Vec literals with references to
those constants (e.g., use ENCRYPTED_INT2_VALUES, ENCRYPTED_INT4_VALUES,
ENCRYPTED_INT8_VALUES, ENCRYPTED_FLOAT8_VALUES) so both this file and the
multitenant/ore_order.rs can import and reuse them.
In `@packages/cipherstash-proxy-integration/src/multitenant/ore_order.rs`:
- Around line 21-22: Replace the formatted SET statement that builds sql with a
parameterized call to set_config to avoid manual SQL construction: instead of
creating sql via format!("SET CIPHERSTASH.KEYSET_ID = '{keyset_id}'") and
calling client.execute(&sql, &[]).await.unwrap(), call client.execute with a
parameterized query that invokes set_config('CIPHERSTASH.KEYSET_ID', $1, false)
(or equivalent) and pass keyset_id as the parameter; keep references to the
existing keyset_id variable and the client.execute invocation so the change is
localized to that helper.
In `@packages/cipherstash-proxy-integration/src/ore_order_helpers.rs`:
- Around line 392-403: The code interpolates col_name directly into SQL (used in
insert_sql and select_sql) which is unsafe for a public helper; replace the
dynamic string with a typed Column enum (e.g., pub enum Column { Foo, Bar, ...
}) and give it a method sql_name(&self) -> &'static str that returns the fixed
identifier for each variant; change this helper's signature to accept Column
instead of a string, call col.sql_name() when building insert_sql and
select_sql, and update all callsites (and any tests) to pass the enum; keep
direction.as_sql() as-is and do not interpolate any other unchecked identifiers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c563bee4-c01d-4ac6-bac0-6ae6a713bf0b
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
packages/cipherstash-proxy-integration/Cargo.tomlpackages/cipherstash-proxy-integration/src/map_ore_index_order.rspackages/cipherstash-proxy-integration/src/multitenant/ore_order.rspackages/cipherstash-proxy-integration/src/ore_order_helpers.rs
|
|
||
| /// Text DESC ordering. | ||
| pub async fn ore_order_text_desc(client: &tokio_postgres::Client) { | ||
| let s_one = "a"; |
There was a problem hiding this comment.
Might be worth testing that lexicographic sort is correct.
Eg "cat" comes before "catalogue"
Replace duplicated numeric order tests with a generic helper that inserts values in interleaved (zigzag) order and verifies ORDER BY. Increases per-test coverage from 3 to 10 values including negatives, zero, and boundary values. Adds missing int4 DESC, int8 DESC, and float8 ASC/DESC tests (16 → 20 total tests).
Acknowledgment
By submitting this pull request, I confirm that CipherStash can use, modify, copy, and redistribute this contribution, under the terms of CipherStash's choice.
Summary by CodeRabbit
Tests
Chores